A constrained hierarchical rule extraction method based on phrase collocations and high-frequency backbone words
نویسندگان
چکیده
Hierarchical-phrase based machine translation model is a popular translation model which combines advantages of phrase-based translation models and syntax-based translation models. However, since there are no linguistic constraints in the procedure of current hierarchical phrase extraction, there are a large number of redundant generalized rules extracted. In this paper, we propose two strategies to limit the extraction of hierarchical rules and eliminate the number of redundant rules: first, we identify the phrase collocations with the log likelihood ratio, and then we require the phrase collocations should be packed as a whole during the extraction; second, we distinguish the backbone words using the frequency, and then set the limit during extraction that the sub phrases which consist of only backbone words can not be replaced with variables. Experimental results show that our methods substantially reduce the number of generalized rules and have no significant decrease in BLEU score.
منابع مشابه
TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models
This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (...
متن کاملA Hybrid Extraction Model for Chinese Noun/Verb Synonym bi-gram Collocations
Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a s...
متن کاملLeft-to-Right Hierarchical Phrase-based Machine Translation
Hierarchical phrase-based translation (Hiero for short) models statistical machine translation (SMT) using a lexicalized synchronous context-free grammar (SCFG) extracted from word aligned bitexts. The standard decoding algorithm for Hiero uses a CKY-style dynamic programming algorithm with time complexity O(n3) for source input with n words. Scoring target language strings using a language mod...
متن کاملNext or Beyond Next: Effect of Contrastive Phrase-Based Treatment on Stage Gain Across Self-Paced and More Time-Constrained Tasks
This study explored the effect of contrastive phrase resynthesis instruction ongaining the teachability hypothesis stages in self-paced versus time-constrained oralproduction and recognition. Three groups (i.e., 23 learners) of high beginner femalelearners in an English language institute were randomly selected from a cohort oflearners. One group received contrastive metalinguistic instruction ...
متن کاملImproving Collocation Extraction for High Frequency Words
The purpose of this paper is to introduce an alternative word association measure aimed at addressing the under-extraction collocations that contain high frequency words. While measures such as MI provide the important contribution of filtering out sheer high frequency of words in the detection of collocations in large corpora, one side effect of this filtering is that it becomes correspondingl...
متن کامل